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The need for elucidating the effects of environmental factors in the 
determination of the novel corona virus (COVID-19) is very vital. This study 
is a methodological study to compare three different test models (1. Artificial 
neural networks (ANN), 2. Adaptive neuro fuzzy inference system (ANFIS), 
3. A linear classical model (MLR)) used to determine the relationship 
between COVID-19 spread and environmental factors (temperature, 
humidity and wind). These data were obtained from the studies (Pirouz, 
Haghshenas, Haghshenas, & Piro, 2020) with confirmed COVID-19 patients 
in Wuhan, China, using temperature, humidity and wind as the independent 
variables. The measured and the predicted results were checked based on 
three different performance indices; Root mean square error (RMSE), 
determination coefficient (R?) and correlation coefficient (R). The results 
showed that ANFIS and ANN are more promising over the classical MLR 
models having an average R-values of 0.90 in both calibration and 
verification stages. The findings indicated that ANFIS outperformed MLR 
and ANN. In addition, their performance skills boosted up to 25% and 9% 
respectively based on the determination coefficient for the prediction of 
confirmed COVID-19 cases in Wuhan city of China. Overall, the results 
depict the reliability and ability of AIl-based models (ANFIS and ANN) for 
the simulation of COVID-19 using the effects of various environmental 
variables. 
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1. INTRODUCTION 
The novel coronavirus (SARS-CoV II) also known as COVID-19 is a wrapped RNA virus that is 
spread extensively among people, birds and different mammals, it causes respiratory, enteric, hepatic, and 
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neurologic illness. COVID-19 is an emerging and re-emerging pandemic infectious disease, which is of 
global concern to the public health [1]. COVID-19 first appeared in December 2019 in Wuhan city of China 
reporting the first 4 cases in the world, and this might be connected to the Southern China (Huanan) seafood 
wholesale market. Moreover, individual transmission cases were found to be rapidly increasing 
asymptomatically [2]. In the beginning, regional epidemic has since quickly enlarged into global pandemic. 
The COVID-19 affected 212 countries around the world with huge morbidity and mortality rate, with more 
than 3,700,00 people infected with the disease [3]. 

The mode of transmission of COVID-19 can be the same as for other respiratory diseases, which 
can be transmitted through droplets of various sizes. A research on 75,465 subjects which shows that 
COVID-19 infection is basically transmitted among individuals through droplet and contact routes and not 
through airborne transmission [4]. Even though, other modes of transmission might be possible. Researchers 
are devoting their time and skills towards the mechanism and modes of transmission of this virus [5]. In early 
2020, this disease has spread quickly around the world. In the highlight of the potential danger of this 
pandemic, researchers and medical experts have been doing their best to comprehend this new infection and 
the pathophysiology of this disease to reveal likely treatment regimens and find the efficient therapeutic 
agents as well as the vaccine [6]. 

For instance Colak et al. performed a retrospective case-control study. 124 patients who had been 
identified to have CAD by coronary angiography (in any event 1 coronary stenosis > half in major epicardial 
courses) were joined up with the work. Angiographically, the 113-social order (2) with typical coronary 
arteries were taken as control subjects. Multi-layered perceptions artificial neural network (MLP-ANN) 
engineering were applied. The ANN models prepared with various learning algorithms were acted in 237 
records, isolated into preparing (n=171) and testing (n=66) data set. The presentation of expectation was 
assessed by sensitivity, specificity and accuracy values with regards to standard definitions. In addition, the 
outcomes have shown that ANN models trained with eight various learning algorithms are promising a direct 
result of high (greater than 71%) sensitivity, specificity and accuracy values in the forecast of CAD. 
Accuracy, sensitivity and specificity values differed between 83.63%-100%, 86.46%-100% and 74.67%- 
100% for preparing, separately. For testing, the qualities were over 71% for sensitivity, 76% for specificity 
and 81% for accuracy [7]. Also, Fazilic et al. reported the application of ANFIS in a research and has been 
tested and applied on several studies in predicting a disease for prediction of dermatological diseases [8]. 

The environment in which COVID 19 virus is suspended can significantly influence the survival and 
transmission of the virus. Although it has been shown that transmission of the respiratory virus is by human 
to human route, either by inhaling the aerosols sneezed or coughed out by infected person or by touching 
infected surfaces and getting the droplets pass through eyes, mouth or nose (the T zone). It is still imperative 
to know that the ability of the virus to survive in various surfaces differs greatly [9]. As a general concept, 
viruses including the coronavirus has been shown to live for a long period on objects outside the body of the 
host organism. At room temperature, the virus can survive for days while at higher temperatures, the virus 
can survive for much less period. COVID-19 can survive for hours on sterile surfaces, aluminum or surgical 
gloves which increases the probability to get infected via contact, exhaled droplets can stay as aerosols for 
some time thereby enhancing a distanced human to human transmission through the movement of air (wind 
effect). Subsequently, transmission by fecal route might be possible as it is shown that some fecal sample of 
infected individuals has tested positive for the virus. A study shows that the virus can survive for a period of 
4 days in stool and the virus can be infectious in sewage and water for weeks, these suggests that there is 
need for further investigation on role of contaminated sewage on transmission of the disease [10]. For about 5 
to 6 decades, numerous articles and publications have been reported in the technical literature depicting the 
effect of environmental factors such as temperature, relative humidity and wind on the survival and spread of 
viral agents. The survival and transmission of airborne infection depend on the dissemination of the virus in 
the index person and the transfer of the virus to a secondary host. During this journey, environmental factors 
plays a major role in the transmission chain [11]. For instance, Pirouz et al. reported that artificial 
intelligence and regression analysis are strong and reliable tools in predicting the novel COVID-19 outbreak 
using various environmental factors [1]. It is imperative to note that since the creation of the novel Al-based 
models to our knowledge this is the first research conducted in the literature, showing the combined 
applications of ANN and ANFIS in the prediction of COVID-19 outbreak using various environmental 
factors. 

One of the major reasons of applying these models is due to the fact that in order to generate a 
consistent predicting approach various models might not be enough due to the nature of dynamic properties 
of the measured data. Therefore, this makes it necessary for modellers to develop and construct efficient and 
stronger models with the help of the current and existing data in hand. 
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According to the studied established in the literature, the tradional regression method were the 
widely employed approaches, which have lower precison and sensitivity. Therefore, this brings about the 
need for the development of the robust non-linear AI-based techniques [12]. 

This work is aimed to determine the applications of two different non-linear models (ANN and 
ANFIS) with a linear classical model MLR to predict the outbreak of COVID-19 in Wuhan city, China using 
various environmental factors such as wind, temperature and humidity as the input variable 


2. MATERIAL AND METHOD 
2.1. Instrumentation 
2.1.1. Humidity 

The humidity is measured by digital hygrometer and a built-in position sensor. The sensor 
comprises of a polyimide film spin-coated onto a Si substrate. The sensor model for signal processing, which 
portrays the sensor capacitance with regard to relative humidity and temperature closed-path [13]. 


2.1.2. Temperature 

A thermometer is an instrument used in measuring temperature; a thermometer can be used in 
measuring the temperature of liquids, solids and gases. It is equally used in measuring temperature of air as 
used in this study [14]. 


2.1.3. Wind 
The instrument used in measuring wind speed in this experiment is an anemometer (a type of 
weather instrument used to measure wind speed and direction [15]. 


2.2. Proposed methodology 

In this study, the data was taken from historical experimental results from a study conducted by [1] 
to predict the outbreak of novel COVID-19 disease using different environmental factors by applying linear 
and non-linear models ANN, ANFIS, and MLR. The data was separately proposed in order to investigate the 
environmental factors that affect the confirmed cases of coronavirus; the data of previous research [1] was 
collected to identify the relationship between temperature, humidity and wind speed with confirmed cases of 
coronavirus. 

Five variables were used as inputs variables; maximum temperature, minimum temperature, average 
temperature, an average humidity and wind speed kilometer per hour. The Confirmed case is considered as 
an output parameter. The data was collected for a period of 30 days, which is composed of 64 instances for 
each of the variables. In the development of this research, Figure 1 shows the flowchart of the AlI-based 
models and experimental methods applied. 
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Figure 1. Shows the flowchart of the Al-based models and experimental methods applied 
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Whereby, the flow chart describes and summarize the overall study, starting with the experimental 
analysis to determine the values of the temperature, humidity and wind as the corresponding input variables. 
An instrumentalist will determine whether there is an instrumental error or not as shown in the flow chart by 
checking the results. The study proceeds with the data driven approach through pre-processing method, 
which involves preliminary data analysis such as correlation analysis and statistical analysis. The models are 
further employed, and their predictive performance was evaluated as shown in Figure 1. 


2.3. Artificial neural networks (ANN) 

Artificial neural networks (ANNs) are generally computational data driven models used in 
emulating and mimicking how the human brain interpret and translate information. They equally composed 
of various neurons as well as units for processing that are interconnected with adaptable biases and weights 
[16]. This current study employs the used of backpropagation algorithms. Based on the technical literature, 
ANNs are systems applied to process information, which are designed like human brain consisting of a basic 
unit known as node (neuron) [17]. 

Therefore, backpropagation is employed in determining the error, which is calculated by taking the 
difference of the simulated values and the measured values. The general equation can be expressed as (1): 


Cn = Va — Va (1) 


2.4. Adaptive neuro-fuzzy inference system (ANFIS) 

ANFIS was proved as a successful software that incorporates the approach of fuzzy surgeon model 
that shows the benefit of both fuzzy logic and ANN in one system. ANFIS is used recently in predicting and 
modelling complex dataset [18]. ANFIS is also a real-world estimator because of its capacity to approximate 
the real functions. Fuzzy logic converts the input data into fuzzy values via the application of membership 
functions. The numbers range between 0-1 [19]. Furthermore, in ANFIS model nodes works as membership 
function (MFs) and also allows the modelling between the relations of the input with the output. 

Assume the FIS contains two inputs ‘x’ and ‘y’ and one output ‘f’, a first-order Sugeno fuzzy has 
the following rules. 


Rule 1: if u(x)is A; and p(y) B, then f,= pyx+qyy+r, (2) 
Rule 2: if u(x)is Az and u(y)is Bz then f2= p2x+qzyt+r2 (3) 


A,, B,,A2,B2Variables are membership functions for x and y inputs p1,q1,11,P2,42,1%,are outlet 
function variables. The structure and formulation of ANFIS follows a five-layer neural network arrangement. 


2.5. Multiple linear regression (MLR) 

This is one of the trivial and classical method used in prediction in engineering, science, health 
science and social sciences. It is generally classified into two main groups; the simple and multiple linear 
regression. Each of these classes can be used depending on the aim. For example, if the study involves a 
single output and single input variable, it is said to be known as simple linear regression (SLR). Furthermore, 
if we want to check the relation between more than two inputs with a single output linear, therefore such is a 
multiple linear regression (MLR) [20]. Usually, MLR is the linear regression type that is used universally, 
and it involves analysis in the form that every value from the input input parameter to be related with the 
output [21]. Generally, this technique consists of estimating the level of correlation that is between a single 
response variable that is the dependent and two or more predictors that is independent variables [22]. 

The general equation of MLR can be shown in (4). 


y= bo + byXx4 + b2X2 +. b;Xx; (4) 


Where x_1, is the value of the ith predictor, b_0 is the regression constant, and b_iis the coefficient of the ith 
predictor. 


2.6. Model validation 

In any computational data-driven approach, the basic aim is to fit the models to a given data sets 
based on the employed indicators in order to produce a satisfactory prediction of the unknown data set [1]. 
Considering issues such as overfitting, reliable calibration performance is not agreement with the verification 
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performance always. Generally, we have different classes of validation consisting of; the popular cross- 
validation. The K-fold is an example of the cross-validation, which is employed in this study. In this 
validation method, the data is classified randomly to two differents sets called, the verification and the 
calibration phases [2]. Among the advantages of this validation approach is that in each round, the training 
and validation data sets are not dependent upon each other. Which leads to the provision of higher 
performance accuaracy [23]. As stated above, the data is further divided into categories 75% for the 
calibration (training) and 25% for the testing (verification) stage. Considering the 4-fold cross-validation. It is 
very important to note that other validation methods can be applied to the data set [24]. 


3. RESULTS AND DISCUSSION 
3.1. Models development 

In the development of these models, the simulation was done in MATLAB 9.3 (R2019a). For ANN 
model, a special algorithm known as Levenberg-Marquardt was used by employing 1,000 iterations, 
coefficient of momentum of 0.9, learning speed of 0.01 as well as an MSE of 0.0001. The best architecture of 
the model was optimized and selected through the use of trial by error method. In modelling of ANFIS, 
different kinds of epoch itereations as well as membership function (MFs) were used in order to recognize 
the suitable model architecture. While, the deterministic linear MLR model was developed using the 
simulation tool in the EViews software 9.5. 


3.2. Applications of the data driven approaches result 

AlI-based models (ANFIS and ANN) with a linear model MLR were employed to predict the effects 
of environmental factors on COVID-19 outbreak in Wuhan City of China based on historical data. Prior to 
the modelling, statistical and correlation data analysis was conducted as shown in Table 1. In order to 
understand the behavior and science of the historical data, the relationship that exists among all the variables 
involved more especially the dependent (output) and the independent (input) variables. From the correlation 
and statistical analysis, the science of the data can be well understood prior to navigating into the simulation. 
The statiscal analysis was demonstrated based on the mean, meadian, standard deviation, the minimum as 
well as the maximum number of each of the variables involve in this study as shown in Table 1. 

It can be observed as shown in Table | that there is a strong inverse correlation that exists between 
Max T °C and the confirmed cases having an R-value=-0.53401. Therefore, this can validate the hypothesis 
proposed by various scientist and medical experts that at higher temperature or at temperate regions this virus 
may be expose to death. Table 1 equally shows moderate and direct relationship between average humidity 
and the confirmed case with R=0.382997. The weakest correlation exists between wind and confirmed cases 
having R=0.072453. 

Based on the comparative prediction results of the models, it can be observed clearly that the AI- 
based models i.e. ANN and ANFIS show superiority over the traditional linear regression model MLR with 
considerable performance. Table 2 equally indicates that all the three models show good result in the training 
phase with a lower R2-value of 0.9076 despite the fact that the MLR performance in the testing phase is not 
reliable due to its lower determination of co-efficient value as well as the higher root mean square error 
value. Further descriptive of the result shows that ANFIS outperformed all the three models and increased the 
performance accuracy of MLR and ANN up to 25% and 9% respectively based on their root mean square 
error as shown in Figure 2. 


Table 1. Statistical and spearman pearson correlation analysis 
Statistical Analysis 


Variables MAXT°C MINT°C AVGT°C HRAVG(%) WIND KM/H _ Confirmed Cases 
Mean 15.38 1.39 8.30 77.90 5.37 1604.97 
Median 14.70 0.30 7.35 78.15 4.10 1710.50 
Standard Deviation 4.31 4.03 4.19 8.69 3.26 832.56 
Minimum 8.20 -2.70 2.20 55.50 2.30 349.00 
Maximum 24.90 10.60 18.10 91.30 17.60 3156.00 
Correlation Analysis 
Variables MAXT°C MINT°C AVGT°C HRAVG(%) WIND KM/H _ Confirmed Cases 
Max T °C 1 
Min T °C 0.634798 1 
Avg T °C 0.835269 = 0.781245 1 
Hr Avg (%) 0.11413 0.243575 0.13875 1 
Wind Km/H 0.345658 0.282814 0.149187 0.332678 1 
Confirmed Cases 0.53401 0.35121 0.49456 0.382997 0.072453 1 


Prediction of the effects of environmental factors towards COVID-19 outbreak... (Khalid Mahmoud) 


40 i) ISSN: 2252-8938 


Table 2. Results of the models 
Training Testing 
R2 RMSE R R2 RMSE R 
MLR 0.9076 630.2573 0.9527 0.74337 235.3869 0.862189 
ANN 0.9559 435.4262 0.9777 0.907096 141.6266 0.952416 
ANFIS 0.9997 34.1108 ~—0.9999 0.998279 ~—-:19.27619 0.999139 


(0) 100 200 300 400 500 600 700 
@Testing Training 


Figure 2. Root mean square error of the models in both training and testing phases 


Moreover, the predictive results were equally demonstrated using a graphical illustration 
(scatter plot) to display the relationship that exists between the measured and the simulated values as shown 
in Figure 3. It is clear from the illustration that ANFIS and ANN show higher fitting agreement between the 
measured and the simulated values. The more robust prediction accuracy of the confirmed COVID-19 cases 
is related to the higher correlation that exists between the variables, as shown in Table 1. 
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Figure 3. Scatter plots for (a) MLR (b) ANN and (c) ANFIS 
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Apart from the determination coefficient (R2) shown in Table 2 the advantages of ANFIS and ANN 
in comparison with the traditional MLR model is that mostly MLR fails at a certain point especially when it 
encountered a highly complex and sophisticated non-linear data, which is since MLR model follows the 
mechanism of least-squares method. Besides, the major reason is the generation of negative results that can 
hinder the performance of the model. Figure 4 depicts the performance of the models for the confirmed 
COVID-19 cases in Wuhan city of China using a radar chart that shows the scale of R in both the training 
and testing stages. 
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Figure 4. Radar chart of the confirmed COVID-19 cases in Wuhan City of China 


The predictive comparison of the results can be arranged as ANFIS>ANN>MLR for the prediction 
of confirmed cases of COVID-19 in the Wuhan City of China using various environmental variables. 
Figure 5 demonstrated the response of the models based on time series plots. According to this plot, the 
extent by which the values were spread between the measured and the predicted models proved Table 2. This 
result is in line with [25-27]. 
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Figure 5. Time series plots of the measured and simulated confirmed COVID-19 cases 


4. CONCLUSION 

The threat of COVID-19 is of global concern that need to be addressed quickly and rapidly 
concerning its negative impact worldwide. The need for predicting and simulating the effects of these 
environmental factors for the elucidation of novel COVID-19 outbreak using artificial intelligence (AI) is of 
paramount importance. It is therefore, significant to simulate the effects of these environmental factors 
against the confirmed cases of COVID-19 disease using different models. In this work, three different models 
(ANFIS, ANN and MLR) were employed. In the data-driven method, the data was collected from a previous 
historical study published in the literature, referenced in the material and method section. The comparative 
results proved the ability of the AI-based models (ANFIS and ANN) over the traditional regression model 
(MLR) in predicting the confirmed cases. The results equally showed the strength of ANFIS as a hybrid 
model in outperforming the other two models and increased their predictive performance accuracy up to 25% 
in the testing phase. Mostly, the non-linear models displayed higher prediction accuracy than the classical 
linear models and hence regarded as reliable for predicting the effects of environmental factors towards 
COVID-19 outbreak. Other non-linear models such as support vector machine (SVM), hammerstein-weiner 
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(HW), fuzzy logic (FL) as well as different optimization algorithms such as genetic algorithms (GA) are 
recommended in order to improve the performance accuracy of the modelling. This work is not only 
restricted to china in fact the simulation of various environmental factors towards determining number of 
COVID-19 cases is highly recommended in other parts of the world. 
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